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Abstract. We consider a game played between a hider, who hides a static object in one of several possible 
positions in a bounded planar region, and a searcher, who wishes to reach the object by querying sensors placed 
in the plane. The searcher is a mobile agent, and whenever it physically visits a sensor, the sensor returns 
a random direction, corresponding to a half-plane in which the hidden object is located. We first present a 
novel search heuristic and characterize bounds on the expected distance covered before reaching the object. 
Next, we model this game as a large-dimensional zero-sum dynamic game and we apply a recently introduced 
randomized sampling technique that provides a probabilistic level of security to the hider. We observe that, 
when the randomized sampling approach is only allowed to select a very small number of samples, the cost of 
the heuristic is comparable to the security level provided by the randomized procedure. However, as we allow 
the number of samples to increase, the randomized procedure provides a higher probabilistic security level. 



1. Introduction 

Games of hide-and-seek have been among folklore over the past several centuries. In modern times, these 
games find relevance in modeling problems of protecting high-valued assets or seeking out potential threats. 
This paper is concerned with one such game between a hider, who hides a static object among candidate 
points in a region, and a seeker, who wishes to reach the object by covering minimum distance, while making 
use of information about the object's location obtained from a few sensors deployed in the region. 

A version of this problem without sensor measurements was addressed in our earlier work [BBHPDlOj , wherein 
we formalized the problem in the form of a static matrix game. We provided a randomized approach to obtain 
probabilistic levels of security for the hider. In this paper, we utilize the measurements from additional sensors, 
which effectively make this problem a dynamic game. In addition, we also analyze a novel heuristic for the 
seeker, and compare its performance to the randomized method. 

Related Work. Search theory has received a lot of attention over the past several decades. Classic works 
[StoOTj . [Was02] address several problems involving the search of static as well as moving objects, under various 
assumptions on sensing abilities of the searcher. |BP02j proposes a search-theoretic approach based on rate of 
return maps to develop cooperative search plans for uninhabited air vehicles to detect stationary targets. 

The problem considered in this paper bears similarities to the area of acoustic source localization, which 
involves estimating the location of a source using time-of-arrival measurements (see jAKLV04) . |BSL08j ). 
Recently, |KSIH10| proposed protocols to route one or more unmanned aerial vehicles to collect time-of- 
arrival measurements from sensors deployed in an environment. |SMHQ3) presented an efficient approach to 
detect an object in a polygonal environment. One straightforward strategy to reach the object without the 
use of any sensor measurement is to visit every candidate point via the shortest path through all candidate 
points. However, the computational complexity of determining the shortest path scales exponentially with the 
number of candidate points |Bel62] . 



This material is based upon work supported in part by ARO MURI Grant number W911NF0910553, and in part by the Center 
of Excellence for Research DEWS, University of L'Aquila, Italy. 
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Contributions. We consider a game played between a hider and a searcher. A static object, such as a 
treasure, is placed by the hider at one of m given points, distributed independently and uniformly in a square 
region £ C M^. We assume that s(m) directional sensors are deployed in £ so as to minimize the maximum 
distance from any point in £ |SD96) . Each sensor has an associated line and returns a binary information 
about the half-plane in which the treasure lies, without measurement error. Such a model arises when an array 
of microphones is used at each location jVMRLOS] . The goal for a searcher, assumed to move with bounded 
maximum speed and to have a simple integrator dynamics, is to reach the treasure in the shortest possible 
time (or equivalently, by covering the shortest possible distance), from a given starting point. On the other 
hand, the hider's goal is to make the searcher travel as much distance as possible until the treasure is reached. 

We first present a Divide-and-Search heuristic which involves: (i) moving to the sensor closest to the centroid 
of the sub-region that contains the treasure, (ii) updating the sub-region that contains the treasure based 
on the measurement received, and (iii) performing an exhaustive search through all the remaining candidate 
points, when no more sensor locations are present in the updated sub-region that contains the treasure. We 
provide a novel upper bound on the expected distance covered by the searcher and characterize the number 
of sensors required by this heuristic. The upper bound on the expected distance scales as log(TO), which 
is a significant improvement as compared to following an open-loop approach of following the shortest path 
through the candidate points, the length of which scales as ^/m- 

We then model this game as a zero-sum feedback matrix game and apply the randomized approach introduced 
in |BBHPDT0] for static matrix games, and extended in jBHll] to partial information dynamic games. The 
hider samples columns from a matrix, which represent different policies of the searcher, and computes a 
sampled security level and a policy to play the actual game. We first establish monotonicity of the security 
level of the game as a function of the number of policies of the searcher considered by the hider. A comparison 
of the security levels obtained through the randomized method with the lengths of the paths produced by the 
heuristic procedure reveals that if the hider becomes aware that the seeker is using the heuristic procedure 
she can select (deterministic) locations for the treasure that will lead to very long searches by the seeker. In 
contrast, the randomized approach produces search strategies that make the task of the hider more challenging, 
effectively forcing her to use randomized hiding strategies. However, it is important to note that the randomized 
approach generates policies that are optimized for a specific geometry of the points and can be fragile with 
respect to changes in the positions of the points away from the positions for which the game was sampled. 

Organization. This paper is organized as follows. The problem formulation is presented in Section[2] Prelim- 
inary results are included in Section[31 The Divide-and-Search heuristic is presented and analyzed in Sectional 
The formulation of this problem in the form of a dynamic game is presented in Section [5l Simulations of the 
heuristic and the randomized approach are included in Section [6] 

2. Problem formulation 

Consider a problem in which a seeker has to find a treasure that is placed at one of m given points, distributed 
independently and uniformly in a square region £ C with area denoted by Area(£). We consider deploying 
s(m) directional sensors in £ so that the maximum distance of any point in £ from any sensor is minimized. 
In the following, each point will be regarded in terms of the corresponding index i G P := {1, 2, m} for each 
candidate treasure point, i € S := {m + 1, ...,m -|- s} for each sensor point and i = for the searcher initial 
position, that is assumed to be set at the centre of the square region. 

Each sensor i has an associated straight line defined by n'^ (p — pi) — 0^ i S, where the unit vector 
(normal to the straight line) is given and known, and pi belongs to the line; we assume that n'^ (pj — Pi) 7^ 
for all i €: 5, J S P, i 7^ J, and that the orientations of the unit vectors rii are distributed independently 
and uniformly in [0, 27r]. Whenever a sensor is visited, it returns a binary information about the half-plane 
in which the treasure lies, as shown in Figure [TJ More formally, a seeker visiting a sensor point i will receive 
the observation y = sgn(n'; (pe ^ Pi)) G 1}: where pq denotes the treasure position. This will allow us to 
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restrict the tirae horizon to at most m + s points T — {0, 1, 2, m + s}, after which the treasure is certainly 
found. 

We shortly introduce a bit of extra notation to better describe the strategy of the seeker. At time t G T, let 
X (t) be current location of the pursuer, with x (0) = and x{t) G V U S for t > 1. The position x + 1) is 
decided at time t by a motion control action u {t), namely x (t + 1) = u {t); the pursuer visits the node Px{t) 
and gets the measurement y (t) — sgn{n'^^^^ (pQ — Px(t))) G {^li 1}: where Q V is the treasure position. 

Let Xk = {x(t)}^^Q and Yfe = {y (0}t^=o the sequences of visited nodes and collected observations by the 
seeker, respectively, up to some time A; > 0, with y (0) := 0. For the sake of simplicity, we assume we can 
write Xk+i = U {x (fc + 1)} and Y^+i = Y/c U {y (fc + 1)} with a slight abuse of notation. 
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Figure 1. Problem formulation. A circle denotes a candidate treasure point, a square denotes 
a sensor and the dot denotes the searcher. In this figure, the searcher is at a sensor and has 
received a measurement corresponding to the treasure 9 being at one of the circles in the 
shaded region. 

3. Preliminaries 

In this section, we summarize some preliminary results which we will use in the rest of the paper. 

3.1. Geometric Preliminaries. The first result by |SD96j gives an upper bound on the minimum of the s 
distances from any point z € £. 

Lemma 3.1. The rectangular heuristic from |SD96| to place the s sensors satisfies 

min llz — Pill < -^=\/ Area,(£) 
ie{i,...,s} V2s 

for any z € £. 

The next result is well-known in computational geometry. 

Proposition 3.2 (Minimum enclosing rectangle). Given a convex polygon with area A, the minimum area- 
enclosing rectangle has area upper bounded by 2 A. 

The next result provides an upper bound on the length of the shortest path through all points in a convex 
polygon which is contained inside £. 
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Lemma 3.3 (Shortest path length bound). The length of the shortest path through n points that are inside a 
convex polygon of area A which is contained inside £ is upper bounded by 

2y/A^+ -^^/AIea{£). 
v2 

Proof of Lemma \3.3[ We provide a constructive proof as follows. First, consider the minimum area-enclosing 
rectangle of the given convex polygon. By Proposition 13.21 this rectangle has area upper bounded by 2A. 
Now, let the smaller side of this rectangle have length equal to B and the larger side have length equal to 
H. Define, h := H/B = 2A/B^. Now using Lemma ILS from [BSBHIO] . there exists a path that starts from 
the smaller side, passes through each of the n points exactly once and terminates on the opposite side of the 
rectangle, and has length upper bounded by 

B{V2hn + h + 5/2) = 2VA^i + H + -B < 2vM^ + ^\/Area(£), 

2 v2 

where we used the fact that both B and H can be at most equal to •\/2 Area(£). The result now follows 
since the vehicle would cover a distance of at most ■y/2 Area(£) to reach any point on the smaller side of the 
minimum area-enclosing rectangle of the given convex polygon. □ 

3.2. Randomized sampling for large zero-sum games. In jBBHPDTO] . we introduced the Sampled 
Saddle-Point (SSP) algorithm, in which players sample sub-matrices from the original M x N matrix A, 
solve smaller games and utilize the saddle-point policies so obtained against each other. The SSP algorithm 
can be summarized as follows. 

Let S'^^' denote the set oi k x I left-stochastic (0, l)-matrices (i.e., matrices whose entries belong to the set 
{0, 1} and whose columns add up to one). 



Algorithm 1: Sampled Saddle-Point 
For Pi: 

• Select random matrices Fi e Hi g 

• Compute sub-matrix: Ai = F'j^AIIi 

• Security pohcy: yl e argmin,^^g5^^ max^g^^^ yi'Aiz 

• Security value: t^(^i) = max^g^^^ yl'Aiz 

For P2: 

• Select random matrices F2 € ijA^x™^ and U2 E S^x"^ 

• Compute sub-matrix: A2 = T'2AIl2 

• Security policy: Z2 € argmax^^g^^^ miuj^gg^^ y'A2Z2 

• Security value: y_(A2) = miuygs^^^ y' A2Z2 

Play sampled policies: y* :— Tiyl, z* :— II2Z2 
Output: y*'Az* 



The SSP algorithm is e-secure for player Pi with confidence 1 — 5 if 

(3.1) Pri,niT.,n. [y*' Az* < V{Ai) + e) > 1 - (5. 

The subscript in the probability measure P emphasizes which random variables define the events that are 
being measured. To provide guarantees for specific policies/values, the following notions of security that refer 
to specific policies/values are introduced. The policy y* with value t^(^i) is e-secure for player Pi with 
confidence 1 — i5 if 

(3.2) Pri.ni [y*' Az* < V{Ai) + e\y*, y(^i)) >l-6. 
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The following result provides a bound on the size of the sub-matrices for the players that guarantees e-security 
with e = 0. 

Theorem 3.4 (Game independent bounds). Suppose that the matrices ri,ni,r2,n2 are statistically inde- 
pendent. Then, If Hi and II2 have identically distributed columns and 

[nil + 1 



ni 



1 



n2, 



for some n2 > n2, then the SSP algorithm is e = 0-secure for Pi with confidence 1 — 6. 
If one further increases ui to satisfy 



ni = 



l[lnl+mi+2^miln^ 



"2, 



then, with probability higher than 1-/3, the policy y* with value V{Ai) is t — 0-secure for Pi with confidence 
1-S. 



Suppose that, due to computational limitations, player Pi cannot satisfy the bounds in Theorem l3.4l to obtain 
e = 0-security for a given level of confidence 1 — 6. One option to overcome this difficulty and maintain the 
same high level of confidence would be to accept a larger value for e. 

The following is an a-posteriori procedure for Pi . Let ej denote the jth clement of the canonical basis of R'^i . 



Algorithm 2: A-posteriori procedure 

• Pick values for mi and ni 

• Determine y* and V{Ai) using SSP algorithm 

• Select random matrix Hi £ ^Nxki ^gjj^g distribution of Hi 

Output: V — m.ax.j^^i j.^T^y1' AYliCj 



The following result provides an a-posteriori guarantee on this procedure. 

Theorem 3.5 (A-posteriori bounds). Suppose that the matrices Pi, Hi, 112 statistically independent. 
If Hi and II2 have identically distributed columns and 

-1 
~5 



(3.3) 



fci = 



1 



n2, 



for some n2 > f^2, then the SSP algorithm is e-secure for Pi with confidence 1 — 6 for any e>v — V{Ai). 
If one further increases ki to satisfy 



ki 



ln(l//3) 



n2, 



ln(l/(l-<5)) 

then, with probability higher than 1 — (3, the policy y* with value V{Ai) is e-secure for Pi with confidence 1 — 5. 



4. The Divide-and-Search Heuristic 



We now present and analyze a novel heuristic for the searcher. This heuristic involves computing the centroid 
of a convex polygonal region at every iteration, and then moving to the sensor closest to the centroid. At 
every iteration, the polygonal region is updated using the measurement from the sensor. If no such sensor 
exists, then the searcher performs an exhaustive search over the remaining candidate points in the region. 

Formally, the centroid Cent(Q) of a convex region Q C is defined as the unique point g in Q that minimizes 
||z — grjp dz. As per our measurement model, given an unknown treasure position Q, any sensor x(t) & S 



The confidence level /3 for Pi refers solely to the extraction of the matrix Hi and holds for any given matrix Fi. 



6 ALESSANDRO BORRI, SHAUNAK D. BOPARDIKAR, JOAO P. HESPANHA, MARIA D. DI BENEDETTO 

visited at time t, and a measurement y (t) — sgn(ri^j^jj {jpq — Pa:(t))), let H{x (t) , y (t)) denote the half-plane 
which contains the treasure. 

Let K denote the number of measurements taken by the searcher. Then, the heuristic is described in Algo- 
rithm [3) 



Algorithm 3: Divide-and-Search 
Assumes: £q := £. 
For t^l,...,K, 

• Go to the sensor x (t) closest to Cent(£t_i) 

• Obtain measurement y {t) 

• Determine £t = £t-i H H{x (t) , y (t)) 
end for 

Move on the shortest path through all targets in £k- 



Figure [2] shows snapshots of a numerical implementation of the heuristic, with a high number of nodes. The 
following is a useful property of Algorithm |31 

Lemma 4.1 (Upper bound on Area(£x))- Assume that the sensors are placed using the rectangular heuristic 
by jSD96j . Then, at the end of Algorithm\^ 

(4.1) E[Area(£:K)] < ((^)'^ + ^) Area(£:), 

where the expectation is over the distribution of the measurements obtained by the sensors. 

Proof of Lemma \4.1\ From Algorithm [31 it is immediate that each £t is convex. Consider the region £t~i as 
shown in Figure[31 We obtain a measurement y (t) at a sensor closest to Cent(ft_i), as shown in Figure[21 Let 
I denote a line that passes through Cent(£t) and which is perpendicular to n^t^iy Using the centroid property 
|BH02] that any line through the centroid of a convex region divides the region into two parts such that the 
area of each part is at most two-thirds of the area of the original convex region, we obtain 

2 

Area(£:t) < - Arca(£t_i) + dty/kTC&{£), 

where dt is the distance of the closest sensor to Cent(£(_i) and the second term is the area of the rectangular 
strip between the two lines. Using Lemma [STTl 

2 1 
Area(£t) < - Area(ft_i) + ^= Area(£) 

3 v2s 

2 1 

E[Area(£t)] < -E[Area(£t_i)] -I- ^ Area(£), 

3 v2s 

where the expectation is with respect to the sensor measurements. By using this inequality recursively, and 
from the fact that 1 - (2/3)^^ < 1, we obtain (gH]). □ 

We now present the main result of this section, which provides an upper bound on the expected distance 
covered by the searcher using the Divide-and-Search heuristic, and also characterizes the number of sensors 
required. Let D denote the distance covered by the searcher until the treasure is found. 

Theorem 4.2 (Performance of Algorithm [3]). Suppose that: 

(1) the number of candidate treasure locations m and the number of sensors s satisfy s — [TO/(ln(rn))^] ; 

(2) the sensors are located using the rectangular heuristic from [ SD96] . 
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Figure 2. The Divide-and-Search heuristic. The big sohd dot is the searcher, the blue (hght) 
circles denote target points wherein the treasure is not present, the red (dark) circles denote 
the candidate treasure locations, the solid red dot denotes the exact location of the treasure, 
the + denotes the centroid of the region £t at each iteration, the squares denote the sensors. 
The shortest path is illustrated by a dashed line in the final figure. 



Then, the expected value of D using Alg or ithm\^ satisfies 

iv/A^ < nD] < ^/^ln(V2 ln(3/2)>|) +^V2 + V21n(3/2) InM ^^^^ 

where the expectation is with respect to the joint distribution of the measurements and the candidate treasure 
locations. 



Proof of Theorem \4-S\ Let dist(iir) denote the distance covered by the vehicle by using Algorithm [S] Clearly, 
D < dist(iir), which is a random variable which is upper bounded by the sum of two terms. The first is 
K times the diameter of £, and the second is the length of the shortest path through the remaining miom 
candidate locations in £k- Using Lemma |3.3[ 

9 

dist(A') < Ky^2 Area(f ) + 2\/mrem Area.{£K) + —^^/Aiea{£). 
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Figure 3. Illustrating proof of Lemma [4.11 The dashed line is perpendicular to n^(t) and 
passes through the centroid of £t-i- By the centroid property, the area of the shaded region 
is at most two-thirds of the area of £t-i- 



Taking expectation with respect to the joint distribution of the measurements from the sensors and the target 
points, 

2K + 9 

E[D] < ( — )VArea(f ) + 2E[^/m,^^ Area(fK)] = 

v2 

2K + 9 

= ( ^)v/Area(£') +2E[E[\/mremArea(£'K)| AreaiSx)]] < 

v2 

2K + 9 

< ( ^) VArea(£:) + 2E[VArea(£K) ^EKcml Area(£'K)]] 

V2 

where the second step follows from the law of iterated expectations, and the third step is due to the application 
of Jensen's inequality ^Bre92j to the square-root function. In the above inequality, the outer expectation is 
with respect to the measurements while the inner expectation is with respect to the target points. Now, 
conditioned on the value of Area(£if ), the only information about the candidate locations that can be obtained 
is the existence of the treasure inside £k- The remaining m — 1 locations are distributed independently and 
thus, the expected number of target points inside £k satisfies 

, . ,^ M TO — 1 , , TOArea(fif) 

EKcl Area(£.)] . 1 + Area(^.) < 1 + 



On substitution, and since ■\/Area(£if) < ^ Aiea{£) , we have 



E[D] < + 2)v/A^^+ / E[Area(gK)]. 

v2 yArea^t) 

Using Lemma 14.11 we have 

/2K-\-Q /2\K r- \ , 

(4.2) E[D] < ^ +2 + 2\/^(-) + V2 \n{m)j ^kma.{£), 

since by assumption, s — [m/(ln(m))^] . Now, the right hand side of the above inequality is minimized with 

K* = In (V21n(3/2)V^)/ln(3/2). 



Substituting this value of K in the right hand side of (|4.2I) , we obtain the right hand side of the first inequality. 
The left hand side is trivially true because the treasure can be placed at the diametrically opposite end of the 
region £. □ 
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Remark 4.3 (Number of sensors). Theorem 14.21 implies that with at most [TO/(ln(m))^] number of sensors, 
and with a number of steps K* which grow logarithmicaUy as a function of to, Algorithm [3] may lead to an 
average cost which is at most In(TO) times the optimal. 

5. Formulation as a Matrix Game 

Consider an output feedback zero-sum dynamic game where Pi hides a non-moving object (treasure) in one 
of TO points {pi}^i on the plane and P2 has to find the treasure with minimum cost, by traveling from point 
to point until she finds it, starting at po- At each step, P2 can either visit one of the m points, trying to guess 
the treasure position, or visit one of s(to) sensor points to get better information. The game is 

played over the set of mixed policies: 

• Pi chooses a probability distribution y e Sm for the treasure over the to points; 

• P2 chooses a probability distribution z G iS„ over the set of all the possible n functions of the available 
information, mapping the information available to P2 at some time step to the action to be taken at 
that step. Specifically, the information available consists of the sequence of points visited so far and 
the measurements collected from the sensors visited. The next action to take consists of the index of 
the next point to visit. 

The game can be formulated as a matrix game with a very large number of columns, where the generic entry 
Aij corresponds to the Euclidean cost to find the treasure by playing policy j when the treasure is placed in 
i, namely: 

(5-1) Aij ^ ~J2\^P^AXk,Yk) - Pr,iX^-i,Y^-i)l 

k=0 

where rj(X_i,Y_i) := for all j (starting point) and the summation ends at the index k*^ for which 
rj{Xk'_,Yk'.) corresponds to the point i where the treasure is hidden. The minus sign in Eq. ()5.ip is needed 
to maintain consistency with the formulation in Subsection 13. 2| where Pi is the minimizer. Indeed, Pi hides 
the treasure to maximize the distance and therefore to minimize the entries of A. 

The exact computation of the optimal mixed strategies is intractable because the size of the matrix is very 
large, in general. However, the results regarding the SSP algorithm have a computational complexity that is 
completely independent of the size of the game, which means that we can provide probabilistic guarantees for 
games with an arbitrarily large number of points. 

In this particular game, only the player P2 has a very large number of options, so we can assume that both 
players consider all possible to locations where Pi can hide the treasure (all rows of A), but randomly select 
only a small number of pursuit strategies to construct their submatrices. This means that P2 will never be 
surprised since she always considers all options for the actions of Pi. However, the player Pi that hides the 
treasure should respect the bounds provided by Theorems 13.41 and 13.51 to avoid unpleasant surprises. 

The choice toj, = to in the SSP algorithm is particularly interesting to apply in games where the matrix 
is "fat", with many more columns that rows. In these cases, further results can be obtained for the SSP 
procedure, as shown in Subsection 15. II 

5.1. SSP for "fat" matrix games. 

5.1.1. Sampled security level V{Ai). In the SSP algorithm, the sampled security level for Pi is a random 
variable, given by 

V(A-\) = min maxy'Aiz. 
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In the case mi = m, V{Ai) only depends on ni and on the particular realization of the ni columns, and thus 
is a function of the random variable Hi. Thus, 

V(At) = min majcv'Aiz= maji v*' Aiz — max v*'AIlTei. 
^ ' yes„ zes„i ^es„, je{i,...,ni} ^ 

The following preliminary result essentially states that as the number of sampled columns increases, the 
sampled value V{Ai) is expected to increase, or in other words, the probability that V{Ai) is less than a fixed 
quantity decreases monotonically. 

Lemma 5.1. For any fixed x G M, Pui (V'(^i) < 2^) is non-increasing with ni. Further, if every column of 
A is being sampled with a positive probability by the SSP algorithm, then, in the limit as ni goes to infinity, 
V{Ai) converges almost surely to the original value of the game V{A). 



Proof of Lemma \5.1\ We begin with the first claim. 

Fni {ViAi) <x)= Pn, ( max y*'AIliej < x) = 

\je{l,...,rii / 



V Pni ( max y*'AIliej < x\Ili = fli ) • Pn^ (Hi = Hi) 
= V Pni f max y*'AIliej < x\Ili = fli ) • Pn^ (Hi = Hi) 



where Cj is the jth element of the canonical basis of R"^ 
Now, consider the concatenated matrix Hi := [Hi e], where e € B^^^ . Then, 

Pft, {V{A,)<x)^Y/tl^ f .^^ max y*'Anie, <x|IIi =n) • P^^ (Hi = n) = 

= VPni.e f max y*'A[Ili e]e, < x\Ili = Hi, e = e ) • P^ (Hi = Hi) • Pe (e = e) < 
^ \ie{i,...,ni+i} / 

<VPni.ef max y^'AHie, < x|ni = Hi, e = eV Pn^ (Hi = Hi) • VP, (e = e) = 
= yPn, ( max y*'AUiej < x\ni = Hi ) • Pn^ (Hi = Hi) - 

~^ \3e{l,...,ni} ) 

= Pni {V{A{)<x). 
Thus, the first claim stands proved. 

For the second claim, let Cj denote the j'th element of the canonical basis of M^, and let C\ denote the first 
column of Hi. Now, the probability that Cj is sampled is given by Pni(cj € Range(ni)). By assumption, 
PcAG\ = 9) > 0, Vj G {l,...,n}. This implies that Pci(Ci ^ c^) < 1, G {1,...,A^}. For any j, the 
probability of Cj being drawn in independent trials is 

Fni(cj G Range(ni)) = \ -Pn.isH i Range(ni)) = 
= 1-Pc, (Ci^c,r 



Since Pci iCi 7^ Cj) G (0, 1) by assumption, the probability that Cj does not get sampled goes to zero expo- 
nentially with n\. This implies convergence in probability of V{^A\) to Vi^A), and almost sure convergence is 
implied by the use of Borel-Cantelli lemma |Res98) . □ 
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We now recall that the quantile of a random variable X is the inverse function of the cumulative distribution 
function and returns the value below which random draws from the given distribution would fall (a x 100)- 
percent of the time. For any discrete random variable X and a real number a £ [0, 1], the quantile is defined 
as 

Xa=mi{x eR:P{X <x)>a} , 
and is a non-decreasing function of a. 

The next result shows that, for any fixed a, the quantile of the sampled value V{Ai) is a non-decreasing 
function of ni. 

Proposition 5.2. For any a € [0, 1], the quantile Va{Ai{ni)) of the sampled value of the game V {Ai{ni)) is 
a non- decreasing function of ni. Moreover, in the limit as ni goes to infinity, Va{Ai{ni)) converges almost 
surely to the original value of the game, for every a e [0, 1]. 



Proof of Proposition \57B . For any a and ui, consider the quantiles 

= inf {x e R : P {V{Ai{ni)) < x) > a} , 
Va,{Ai{ni + 1)) = inf {x e M : P {V{Ai{ni + 1)) < x) > a} . 
By definition of the quantile, we have 

P(y(Ai(ni)) < Va{Ai{ni))) > a, 

P {V{Ai{ni)) < Va{Ai{ni)) - e) < a, for all e > 0. 

Now, assume that 

Va{Ai{ni + 1)) = - S < 14(Ai(ni)), 

for some ^ > 0, i.e., the quantile of V is strictly decreasing. Then, 

P {ViAiim + 1)) < + 1))) = P {V{Ai{ni + 1)) < - 5) < 

< P {V{Ai{ni)) < %{Ai{ni)) - 6) < a, 
where we used Lemma l5.ll But, by the definition of quantile, 

P {V{Ai{ni + 1)) < + 1))) > a, 

which is a contradiction, and therefore Va{Ai{ni)) is a non-decreasing function of ni. 
Furthermore, for all a > 0, we have 

lim Va{ni)= lim ini {x E R : F (V{Ai{ni)) < x) > a] ^ 
= inf {xeR:F{V (A) < x) > a} ^ V (A) 
where the limit and the infimum commute due to convergence of V{Ai{ni)) established in Lemma l5. II □ 

5.1.2. Outcome of the game v. We now consider the expression of the a-posteriori outcome v (see Subsec- 
tion which is also a random variable: 

u(rii, fci. Hi, III) = max y*'Ailiej, 
je{i,...M 

where ki is given by Eq. p.3|) . and we consider its quantile 

Vaini,ki) := inf {.T G M : Pni^iii ("(jii, fci, IIi, fli) < x) > a} . 

Analogous to the quantile of the sampled value of the game V{Ai), it can be shown that the quantile of v is 
monotonically non-decreasing as in the following result. 
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Proposition 5.3. For any a G [0, 1], the quantile of the a-posteriori outcome of the game v is a non- 
decreasing function of ki, for any fixed ni. Moreover, in the limit as ni and ki tend to infinity, Va converges 
in distribution to the original value of the game for any a € [0, 1], i.e. 

lim Va{ni,ki,Ui,ni) = V{A). 

ni.ki — >-oo 

Remark 5.4. Note that, since for both the bounds provided by Theorems l3.4l and l3.5l we have lim^-^o fci = +oo, 
the previous results can be restated in terms of S; in particular Va{ni,ki) is non-increasing with S for any 
fixed ni, and in the limit (d 0, ni ^ +oo) it converges in distribution to the value of the game. 

Remark 5.5. It can be argued that the expectations of the a-posteriori outcome v and of the sampled value of 
the game V have the same monotonicity and convergence properties of the corresponding quantiles Va{ni, fci) 
and Va{ni). 

6. Simulations 

We now present results of numerical implementation of the randomized technique applied to the output- 
feedback hide-and-seek game, described in Section [5l 

First, we chose a fixed geometry of m ~ 10 candidate treasure points and s — 2 sensor points, drawn uniformly 
from a 50-by-50 square region, and we ran Monte Carlo simulations of the Algorithm [51 with the a-posteriori 
guarantees described in Theorem 13.51 for increasing values of ni up to the corresponding a-priori bound (see 
Theorem 13. 4[) . We set n2 = 10, (3 — 2 ■ 10~^, and we repeated the simulations for different values of 6 (or, 
equivalently, of fci). Figure|l]shows the behavior of the curves Va{ni, fci) (for different values of fci) and Va{ni), 
as defined in Section l5.ll with a — 0.9. For simplicity, we only show curves obtained using the a-posteriori 
bound in Equation p.3p . 

The plots endorse the monotonicity results obtained in Propositions 15.21 and 15. 3( furthermore, the curves 
Vaini, fci) look reasonably "flat", implying that with the choice of ni that is a few orders of magnitude lower 
than the a-priori bound, one can obtain a security strategy that has a relatively small value of e with high 
probability (see |BBHPD"lO] for further considerations). 

To provide a comparison between the randomized method and the heuristic, we considered a different sim- 
ulation setup and ran Monte Carlo simulations over 30 different geometries of m = 10 candidate treasure 
points, uniformly randomly distributed in a 50-by-50 square region, and s = 2 sensor points placed according 
to the rectangular heuristic from (SD96) . We applied Algorithm[l] with rl,2 = 10, (3 — 2 ■ 10"^, 6 = 0.02, and 
compared the sampled security value (in terms of quantile Va{ni)) to the cost of the Algorithm [3] (heuristic) , 
as shown in Figure [5l 

The heuristic cost represents the seeker's security level, namely the outcome when the seeker uses the heuristic 
and the hider plays the best response to it, by placing the treasure in the last point to be visited according to 
the heuristic. Although this value is very good for the hider (i.e., very negative, which corresponds to a large 
time to find the treasure) , we observe a significant gap between this value (the black dotted line) and the SSP 
curve (the magenta dash-dotted line), especially for large ni. This means that, while the hider can expect a 
large time until the treasure is found when playing against the heuristic policy, it should not expect such a 
favorable outcome when playing against the SSP algorithm. Moreover, this particular hiding policy is very 
fragile because if the seeker learns it, then she can find the treasure in one step. Clearly, the heuristic policy 
is not a Nash equilibrium. From the seeker's point of view, the heuristic provides a reasonable security level, 
better than the ETSP cost and with much less computation, because the first part of Algorithm [3] (the 'K 
steps') allows the exclusion of many points before computing the ETSP path (see Section |4]). In general, an 
advantage of the heuristic with respect to the randomization method is that it does not require the knowledge 
of the entire geometry to determine the solution, but just the sensor locations and the geometry only at the 
last step. 
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Figure 4. Results of Monte Carlo simulations of Algorithm [2] (A-posteriori procedure) over 
a fixed geometry of points in a planar region, for different confidence levels 



It is important to note that the heuristic procedure was constructed assuming that the candidate treasure 
points are uniformly distributed, but other than that, it has not been optimized to any particular geometry 
and therefore it should be fairly robust to changes in the positions of the points. In contrast, the SSP technique 
constructed feedback policies that have been optimized for a specific geometry of the points and can therefore 
be fragile with respect to changes in the positions of the points away from the positions for which the game 
was sampled. 



7. Conclusions and Future Directions 

In this paper, we proposed two different techniques to solve the problem of finding a static object in a planar 
region, when directional measurements are available to the seeker. First, we presented a searching heuristic and 
characterized bounds on the expected distance covered by the searcher; then we addressed the problem as a 
large-dimensional game and applied recent results on randomized sampling approach to get security strategies 
guaranteed with high probability. Simulation results show that, at the cost of performing more computation, 
the randomized procedure provides a lower (better) security level (with probabilistic guarantees) to the hider, 
against the one provided by the search heuristic. On the other hand, the heuristic shows its benefits in that 
it can be implemented very efficiently and its performance does not rely on a specific geometry of the points. 

Future work will focus on extending the problem to the continuous case, by assuming that measurements are 
available in any point of the plane and the object can be put anywhere in the plane; furthermore, the binary 
information sensor models considered in this paper can be replaced by more sophisticated sensors, providing 
continuous and noisy measurements. 
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Figure 5. Comparison between randomized technique and heuristic, over different geome- 
tries. The randomization provides a lower security level than the heuristic, at the cost of 
performing more computation 
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